Inclusive basic block counting calculates cycles just like regular basic block counting, and then propagates it proportionately to all its callers. The cycles of procedures obtained using regular basic block counting (called exclusive cycles), are divided up among its callers in proportion to the number of times they called this procedure. For example, if sin(x) takes 1000 cycles, and its callers, procedures foo() and bar(), call sin(x) 25 and 75 times respectively, 250 cycles are attributed to foo() and 750 to bar(). By propagating cycles this way, __start ends up with all the cycles counted in the program.
Note: The assumption made in propagating times from a callee to a caller is that all calls are equivalent, so that the time attributed is divided equally for all calls. For some functions (sin(), for example), this assumption is plausible. For others (matrix multiply, for example), the assumption can be very misleading. If foo() calls matmult() 99 times for 2X2 matrices, while bar() calls it once for 100X100 matrices, the inclusive time report will attribute 99% of matmult()'s time to foo(), but actually almost all the time derives from bar()'s one call. If you are familiar with the gprof command, you will see the similarity. The difference is that inclusive basic block counting uses ideal cycles calculated by prof, rather than real sampled time. The output of prof is very similar to the traditional gprof output, except that it uses ideal cycles instead of real execution time.
The steps to obtain inclusive times are essentially the same as those explained in "Basic Block Counting." Additionally, you have to instrument (using pixie) your program with an additional -gprof option, and invoke prof with an additional -gprof option. For more information on gprof, see the gprof(1) reference page. To view the page, click the word gprof.